OcrV1, Main, Exploration, bibRecord, 001E68

A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems

Identifieur interne : 001E68 ( Main/Exploration ); précédent : 001E67; suivant : 001E69

A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems

Auteurs : Khaled Mostafa [Égypte] ; I. Shaheen [Égypte] ; M. Darwish [Égypte] ; Ibrahim Farag [Égypte]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 1999.

RBID : ISTEX:3E3F186B8873FE74B41C4CF4826422234F1E68BA

Descripteurs français

Pascal (Inist)
- Arabe, Reconnaissance forme, Reconnaissance optique caractère, Reconnaissance écriture, Système intelligent.

English descriptors

KwdEn :
- Arabic, Handwriting recognition, Intelligent system, Optical character recognition, Pattern recognition.

Abstract

Abstract: In this paper, we propose a new approach for detecting and correcting segmentation and recognition errors in Arabic OCR systems. The approach is suitable for both typewritten and handwritten script recognition systems. Error detection is based on rules of the Arabic language and a morphology analyzer. This type of analysis has the advantage of limiting the size of the dictionary to a practical size. Thus, a complete dictionary for roots, which does not exceed 5641 roots, the morphological rules and all valid patterns can be kept in a moderate size file. Recognition channel characteristics are modeled using a set of probabilistic finite state machines. Contextual information is utilized in the form of transitional probabilities between letters of previously defined vocabulary (finite lexicon) and transitional probabilities of garbled text. The developed detection and correction modules have been incorporated as a post-processing phase in an Arabic handwritten cursive script recognition system. Experimental results show a considerable enhancement in performance.

Url:

https://api.istex.fr/document/3E3F186B8873FE74B41C4CF4826422234F1E68BA/fulltext/pdf

DOI: 10.1007/978-3-540-48765-4_57

Affiliations:

Égypte

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems</title>
<author><name sortKey="Mostafa, Khaled" sort="Mostafa, Khaled" uniqKey="Mostafa K" first="Khaled" last="Mostafa">Khaled Mostafa</name>
</author>
<author><name sortKey="Shaheen, I" sort="Shaheen, I" uniqKey="Shaheen I" first="I." last="Shaheen">I. Shaheen</name>
</author>
<author><name sortKey="Darwish, M" sort="Darwish, M" uniqKey="Darwish M" first="M." last="Darwish">M. Darwish</name>
</author>
<author><name sortKey="Farag, Ibrahim" sort="Farag, Ibrahim" uniqKey="Farag I" first="Ibrahim" last="Farag">Ibrahim Farag</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:3E3F186B8873FE74B41C4CF4826422234F1E68BA</idno>
<date when="1999" year="1999">1999</date>
<idno type="doi">10.1007/978-3-540-48765-4_57</idno>
<idno type="url">https://api.istex.fr/document/3E3F186B8873FE74B41C4CF4826422234F1E68BA/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000546</idno>
<idno type="wicri:Area/Istex/Curation">000539</idno>
<idno type="wicri:Area/Istex/Checkpoint">001419</idno>
<idno type="wicri:doubleKey">0302-9743:1999:Mostafa K:a:novel:approach</idno>
<idno type="wicri:Area/Main/Merge">001F77</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:99-0397539</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000813</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000B81</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000799</idno>
<idno type="wicri:doubleKey">0302-9743:1999:Mostafa K:a:novel:approach</idno>
<idno type="wicri:Area/Main/Merge">002180</idno>
<idno type="wicri:Area/Main/Curation">001E68</idno>
<idno type="wicri:Area/Main/Exploration">001E68</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems</title>
<author><name sortKey="Mostafa, Khaled" sort="Mostafa, Khaled" uniqKey="Mostafa K" first="Khaled" last="Mostafa">Khaled Mostafa</name>
<affiliation wicri:level="1"><country xml:lang="fr">Égypte</country>
<wicri:regionArea>Information Technology Department, Faculty of Computers and Information, Cairo University, 12613, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Shaheen, I" sort="Shaheen, I" uniqKey="Shaheen I" first="I." last="Shaheen">I. Shaheen</name>
<affiliation wicri:level="1"><country xml:lang="fr">Égypte</country>
<wicri:regionArea>Computer Engineering Department, Cairo University, 12613, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Égypte</country>
</affiliation>
</author>
<author><name sortKey="Darwish, M" sort="Darwish, M" uniqKey="Darwish M" first="M." last="Darwish">M. Darwish</name>
<affiliation wicri:level="1"><country xml:lang="fr">Égypte</country>
<wicri:regionArea>Computer Engineering Department, Cairo University, 12613, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Égypte</country>
</affiliation>
</author>
<author><name sortKey="Farag, Ibrahim" sort="Farag, Ibrahim" uniqKey="Farag I" first="Ibrahim" last="Farag">Ibrahim Farag</name>
<affiliation wicri:level="1"><country xml:lang="fr">Égypte</country>
<wicri:regionArea>Institute of Statistical Studies and Research, Cairo University, 12613, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>1999</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">3E3F186B8873FE74B41C4CF4826422234F1E68BA</idno>
<idno type="DOI">10.1007/978-3-540-48765-4_57</idno>
<idno type="ChapterID">57</idno>
<idno type="ChapterID">Chap57</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Arabic</term>
<term>Handwriting recognition</term>
<term>Intelligent system</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Arabe</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance écriture</term>
<term>Système intelligent</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In this paper, we propose a new approach for detecting and correcting segmentation and recognition errors in Arabic OCR systems. The approach is suitable for both typewritten and handwritten script recognition systems. Error detection is based on rules of the Arabic language and a morphology analyzer. This type of analysis has the advantage of limiting the size of the dictionary to a practical size. Thus, a complete dictionary for roots, which does not exceed 5641 roots, the morphological rules and all valid patterns can be kept in a moderate size file. Recognition channel characteristics are modeled using a set of probabilistic finite state machines. Contextual information is utilized in the form of transitional probabilities between letters of previously defined vocabulary (finite lexicon) and transitional probabilities of garbled text. The developed detection and correction modules have been incorporated as a post-processing phase in an Arabic handwritten cursive script recognition system. Experimental results show a considerable enhancement in performance.</div>
</front>
</TEI>
<affiliations><list><country><li>Égypte</li>
</country>
</list>
<tree><country name="Égypte"><noRegion><name sortKey="Mostafa, Khaled" sort="Mostafa, Khaled" uniqKey="Mostafa K" first="Khaled" last="Mostafa">Khaled Mostafa</name>
</noRegion>
<name sortKey="Darwish, M" sort="Darwish, M" uniqKey="Darwish M" first="M." last="Darwish">M. Darwish</name>
<name sortKey="Darwish, M" sort="Darwish, M" uniqKey="Darwish M" first="M." last="Darwish">M. Darwish</name>
<name sortKey="Farag, Ibrahim" sort="Farag, Ibrahim" uniqKey="Farag I" first="Ibrahim" last="Farag">Ibrahim Farag</name>
<name sortKey="Shaheen, I" sort="Shaheen, I" uniqKey="Shaheen I" first="I." last="Shaheen">I. Shaheen</name>
<name sortKey="Shaheen, I" sort="Shaheen, I" uniqKey="Shaheen I" first="I." last="Shaheen">I. Shaheen</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001E68 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001E68 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:3E3F186B8873FE74B41C4CF4826422234F1E68BA
   |texte=   A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

Serveur d'exploration sur l'OCR

A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems

A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.